Derivation and internal validation of the multivariate toxigenic C. difficile diarrhea model and risk score for emergency room and hospitalized patients with diarrhea

Background: Many factors have been associated with the risk of toxigenic C. difficile diarrhea (TCdD). This study derived and internally validated a multivariate model for estimating the risk of TCdD in patients with diarrhea using readily available clinical factors. Methods: A random sample of 3,050 symptomatic emergency department or hospitalized patients undergoing testing for toxigenic C. difficile at a single teaching hospital between 2014 and 2018 was created. Unformed stool samples positive for both glutamate dehydrogenase antigen by enzyme immunoassay and tcdB gene by polymerase chain reaction were classified as TCdD positive. The TCdD Model was created using logistic regression and was modified to the TCdD Risk Score to facilitate its use. Results: 8.1% of patients were TCdD positive. TCdD risk increased with abdominal pain (adjusted odds ratio 1.3; 95% CI, 1.0–1.8), previous C. difficile diarrhea (2.5, 1.1–6.1), and prior antibiotic exposure, especially when sampled in the emergency department (4.2, 2.5–7.0) versus the hospital (1.7, 1.3–2.3). TCdD risk also increased when testing occurred earlier during the hospitalization encounter, when age and white cell count increased concurrently, and with decreased eosinophil count. In internal validation, the TCdD Model had moderate discrimination (optimism-corrected C-statistic 0.65, 0.62–0.68) and good calibration (optimism-corrected Integrated Calibration Index [ICI] 0.017, 0.001–0.022). Performance decreased slightly for the TCdD Risk Score (C-statistic 0.63, 0.62–0.63; ICI 0.038, 0.004–0.038). Conclusions: TCdD risk can be predicted using readily available clinical risk factors with modest accuracy.


Introduction
Clostridioides difficile is the most common infectious cause of healthcare-associated diarrhea. 1It can impose a significant burden on the healthcare system and can cause life-threatening complications including pseudomembranous colitis, toxic megacolon, colonic perforation, sepsis, and death. 2 Diarrhea is very common in hospitalized patients, occurring in 12%-32% of hospitalized patients with less than one-fifth caused by toxigenic C. difficile. 3It would be helpful to be able to predict the probability that a patient in the emergency room or hospital with diarrhea has toxigenic C. difficile diarrhea (TCdD).Patients with a high risk of TCdD might be immediately started on empiric therapy, potentially reducing their risk of developing lifethreatening complications.Patients with a high risk of TCdD who test negative might have their stool analysis repeated or proceed to colonoscopy for macroscopic evidence of disease.Finally, stool analysis might be avoided for patients with a low risk of TCdD, thereby decreasing test utilization.
Published systematic reviews and meta-analyses have associated C. difficile infection with several demographic factors, historical data, medication exposures, physical examination findings, and laboratory results (Table 1).C. difficile infection risk and severity increases with increasing age. 4,5Patients in hospital for prolonged stays have a greater risk of hospital-acquired C. difficile infection. 16Previous TCdD was the historical factor having the strongest association with current TCdD risk. 4ntibiotic [4][5][6][7] and proton pump inhibitor 4,5,10 exposure consistently increased TCdD risk, while exposure to histamine-2 receptor antagonists 11 and steroids 12 has also been associated with increased risk.The only physical finding consistently associated with TCdD risk was hypotension. 15Finally, patients with TCdD are more likely to have leukocytosis 4 , elevated creatinine, 4,5,13 and hypoalbuminemia. 13However, the independent influence that these factors have on TCdD risk is currently unknown.Models exist the predict the likelihood of asymptomatic toxigenic C. difficile carriage in the elderly, 17 the risk that currently asymptomatic people subsequently develop TCdD, [18][19][20] and the probability of ominous outcomes in TCdD patients. 21,22A large number of factors have been associated with C. difficile infection (Table 1), but no model exists that returns the probability that a patient with diarrhea has TCdD.In this study, we took a random sample of patients at our hospital from 2014 to 2018 who underwent toxigenic C. difficile testing to derive and internally validate a model that estimates the probability of TCdD.

Study setting
This study took place at the Ottawa Hospital, a 1,000-bed teaching hospital with 2 campuses that is the tertiary referral institution and trauma center for a region of approximately 1.3 million people.
Annually, the Ottawa Hospital has more than 175,000 emergency department visits, 40,000 non-psychiatric admissions, and performs more than 50,000 surgical procedures.The study protocol was approved by our research ethics board and adhered to throughout (File 20210372-01H).

Derivation of the sampling frame and sample size calculation
We created the study's sampling frame using our hospital's data warehouse to identify all stool analyses for toxigenic C. difficile conducted in our emergency department or hospital between March 15, 2014 (before which results were unavailable in our data warehouse) and December 31, 2018 (final date for which data was available when the study was initiated).Of these 29,311 specimens, we excluded those tests that were not the first sample for each patient (n = 11,608).We included only the first sample taken to be a proxy for whether patients presented with diarrhea or developed it during their admission.From the remaining 17,703 specimens, we randomly selected 3,100 stool samples (17.5% of the final sampling frame) using the RANUNI random number function in SAS.This sample size was determined using methods presented by Riley et al. 23 We knew that 8.7% of stool samples tested positive for toxigenic C. difficile, and we conservatively projected a need for 44 degrees of freedom in our model (Appendix A).We then calculated 4 required sample sizes based upon (1) a 1% error in the expected probability from the null model's intercept (sample size = 3,060), (2) a 5% mean error around individual predictions (sample size = 1,010), (3) target shrinkage of 80% (sample size = 2,332), and (4) an optimism of 0.05 (sample size = 1,743).We took the maximum of these estimates (n = 3,060) and, anticipating an exclusion rate (due to poor documentation or unavailable records) of between 1% and 1.5%, settled on the final sample size of 3,100 patients.

Outcome
Unformed stool samples from patients with suspected C. difficile infection were tested using a 2-step algorithm.Formed stool samples were rejected by the laboratory and not analyzed.Samples were initially screened for the C. difficile-specific glutamate dehydrogenase common antigen (GDH) by enzyme-linked immunosorbent assay (CDiffCHEK-60, TECHLAB, USA).GDH-positive samples were further tested for the presence of the tcdB gene using the Cdiff Simplexa® Universal PCR (Diasorin Molecular, USA).Patients with GDH-positive, polymerase chain reaction (PCR)-positive samples were classified with TCdD; all other patients were classified as TCdD negative.

Data collection
To identify potential factors that we should offer to our prediction model, we first identified all covariates found to be significantly associated with any type of TCdD in published systematic reviews or meta-analyses regarding C. difficile diarrhea (Table 1).In addition to these 15 covariates, we identified 2 other covariates that we thought might be associated with TCdD (eosinophil count and hospitalization day of sampling) based on personal experience and cohort studies. 15,16e then captured values for each of these factors for all patients in our sample.We retrieved values for ten covariates (age, testing location, hospitalization antibiotic exposure, gastric acid suppression therapy, steroid exposure, and laboratory values) from our hospital data warehouse and the remaining covariates (comorbidity status, previous C. difficile infection, abdominal pain, systolic blood pressure) from medical record review by trained abstractors (SD, JZ, YY, EB) who were licensed and practicing physicians between June and December 2021 (Appendix A).The abstraction database interface blinded abstractors to TCdD status.Patients were classified as exposed to antibiotics during their encounter if they had been given any member of the beta-lactam, carbapenems, sulpha, aminoglycoside, macrolides, fluoroquinolone, tetracycline, lincosamide classes, or metronidazole prior to toxigenic C. difficile testing.Patients were classified as exposed to antibiotics prior to their encounter if they were prescribed oral or intravenous antibiotics (other than metronidazole or vancomycin) within the previous 3 months according to treating physician notes.Steroid exposure included prednisone, prednisolone, dexamethasone, or methylprednisolone prior to or during their encounter.We abstracted vital signs closest to and prior to toxigenic C. difficile testing.

Analysis
We used logistic regression (SAS 9.4.2,Cary, NC) to measure the adjusted association of the selected variables with TCdD status.To control for the excessive influence of outliers, all continuous variables were Winsorized (ie, values below and above the 1st and 99th percentile were assigned those values, respectively).
Missing data was an issue for the laboratory tests, with the prevalence of missing values ranging from 3.5% for white blood cell count to 44.6% for serum lactate.Because these variables had skewed distributions, they were first log transformed to improve their imputation.We then used PROC MI to apply the expectation-maximization (EM) algorithm to compute the maximum likelihood estimates for imputation.All variables in the analytical data set, including outcomes, were used in the imputation model with 20 imputations.
To create the TCdD Model, we used a continuous variable transformation identification algorithm from Sauerbrei et al 24 which combined backward elimination with an adaptive transformation identification algorithm to select the best fractional polynomial transformation for each continuous variable's adjusted association with TCdD status using an alpha-error inclusion criterion of 0.1.Each polynomial included 2 terms and consumed 4 degrees of freedom (2 for the algorithm and 2 for the variable). 25ur original model-building strategy planned to offer all candidate variables (Appendix A) to the algorithm.We did not offer lactate to the model because it was missing in 44.6% of patients.In addition, we offered 2 interactions to the model: (1) an interaction between age and white blood cell count because the latter increases less in older patients 26,27 and (2) an interaction between sampling location and previous antibiotic status because a classification tree (for an analysis not reported here) showed a strong interaction between these 2 variables.This observation might reflect that preadmission antibiotic exposure would be more completely documented in the consultation note for patients presenting with diarrhea (and, therefore, sampled in the emergency department).This model-building strategy was applied to each of the 20 imputation samples.The variables and transformations most commonly selected in the samples were used for the final variable selection.This model was fit in each imputation sample and final pooled parameter estimates were determined using PROC MIANALYZE.
Model performance was assessed using internal validation within 1,000 bootstrap samples.All reported performance statistics accounted for optimism using methods described by Steyerberg. 28e calculated the C-statistic (for discrimination) and the integrated calibration index (ICI for calibration) 29 and used the bootstrap percentile method to calculate 95% confidence limits. 30-statistics range from 0.5 to 1 with higher values indicating better discrimination.The ICI calculates the absolute difference between true and expected probability using a LOcal regESSion (LOESS) regression model of true values regressed on expected probabilities.ICI values range from 0 to 1 with lower values indicating better calibration.Finally, we used the methods described by Sullivan et al 31 to reduce the final model to a points-based risk score, which we refer to as the TCdD Risk Score.The performance of this score was determined with the optimism-corrected C-statistic and ICI in 1,000 bootstrap samples.

Results
Of the 3,100 randomly selected patients, 50 (1.6%)were excluded because no vital signs were recorded before the sample was taken (n = 36), no complete admission or consult note was present (n = 10), or medical record access was restricted by the patient (n = 4).This left 3,050 patients in the study cohort (Table 2).Patients had a mean age of 63.1 years (SD, 18.0), and most were admitted from the emergency room.Abdominal pain was present in almost a third of patients.Previous toxigenic C. difficile diarrhea (TCdD) was uncommon (1.2%).Antibiotic use prior to admission was recorded in a third of patients with antibiotic administration during the hospitalization (prior to sampling) recorded in more than half.Albumin and lactate were not measured in 22.3% and 44.6% of patients, respectively.
The TCdD Model included 4 binary variables and 4 continuous variables along with 2 interaction terms (Appendix B).The presence of abdominal pain (adjusted odds ratio [adjOR] 1.3 [95% CI, 1.0-1.8])and a history of previous TCdD (adjOR 2.5 [1.1-6.1]) were both independently associated with an increased risk of TCdD (Table 3).Sampling location interacted significantly with previous antibiotic exposure: increased TCdD risk associated with previous antibiotic exposure was significantly greater when patients had their stool sample taken in the emergency room (adjOR 4.2 [95% CI, 2.5, 7.0]) than when patients had their stool sample taken in hospital (adjOR 1.7 [1.3, 2.3]).In contrast, sampling location had no influence on TCdD risk in patients without previous antibiotic exposure.TCdD risk decreased with time from presentation to testing (Figure 1A) and eosinophil count (Figure 1B).Patient age and white blood cell (WBC) count interacted significantly (Appendix B).TCdD risk increased with rising WBC count predominantly when patients were older than 60 years (Figure 2).Similarly, TCdD risk increased with rising age primarily when WBC count exceeded 10 × 10 9 /L.
The TCdD Risk Score is presented in Table 4.A 1-point increase in the score represents the increase in TCdD risk associated with being sampled in the emergency department without previous antibiotic exposure.In our sample, the TCdD Figure 2. Interaction of patient age and white cell count on adjusted toxigenic C. difficile diarrhea (TCdD) risk.This heat map presents the adjusted likelihood of TCdD risk as a function of patient age (horizontal axis) and WBC count (left vertical axis).TCdD risk is expressed as an adjusted odds ratio (right vertical axis) relative to that of a 60-year-old patient with a WBC count of 11 (with an adjusted odds ratio of 1.0, presented as white).Adjusted odds ratios less than 1.0 (indicating lower TCDD risk) are blue; adjusted odds ratios exceeding 1.0 (indicating higher TCdD risk) are red.See Appendix B for the entire TCdD Risk Model.

Discussion
In a randomly sampled and large cohort of symptomatic patients at a single hospital, this study measured the independent association of factors previously identified to be associated with toxigenic C. difficile diarrhea (TCdD).We identified several factors that were independently associated to create the TCdD Model, which had moderate discrimination and good calibration.Predictive performance was slightly reduced in the TCdD Risk Score.We conclude that TCdD risk can be predicted in symptomatic patients using readily available clinical factors with modest accuracy.
The TCdD Model performed with modest accuracy despite using optimal model construction methods.First, we reviewed the literature to identify all factors significantly associated with TCdD risk in systematic reviews or meta-analyses.We then determined the status of these variables in an appropriately sized and randomly selected cohort.Therefore, we believe it is unlikely that our model's middling predictive performance stems from missing a known important predictor or an inadequate sample size.Second, model construction used all of the techniques recommended in the Prediction model Risk Of Bias ASsessment Tool (PROBAST) criteria 32 : we used appropriate data sources and inclusion criteria; all predictors were precisely defined, were assessed without knowing outcome status, and are available to clinicians; the outcome was determined in a universal, standard fashion independent of predictor status; and we used a large patient cohort, analyzed continuous variables appropriately, appropriately accounted for missing data, included almost all randomly selected patients in our analysis, avoided predictor selection based on univariate analysis, and accounted for optimism when measuring model performance.Therefore, we don't believe that the TCdD Model's modest performance was due to substandard methodologies.
Several issues should be kept in mind when interpreting our results.First, although our study included a cohort that was large and randomly sampled from a valid and inclusive sampling frame, it was limited to a single institution.The validity of our model should be determined in an external population, as opposed to the interval validation with bootstrap sampling used in this study.Second, data collection was not prospective.This trait is most relevant to factors reliant upon physician documentation, such as abdominal pain and previous antibiotic use.Our analysis found that these factors were both independently associated with TCdD  4, black).Optimism-adjusted fit statistics in the internal validation population (top left) were calculated on 1,000 bootstrap samples.
status (Table 3); misclassification of these important variables due to inaccurate physician documentation could mute their true association with TCdD risk.Therefore, TCdD prediction accuracy might improve with prospective data collection.Third, laboratory staff in our hospital exclude formed stools sent for C. difficile toxin analysis.Therefore, our study included only patients with diarrhea.However, there are 2 situations in which study patients with C. difficile toxin-positive stool could be misclassified with TCdD: patients colonized with C. difficile with diarrhea of another cause and patients with formed stool whose sample was incorrectly analyzed.We believe it to be unlikely that the prevalence of these situations was high enough to importantly alter results.Fourth, although our medical record abstractors were not told each patient's TCdD status, they might have determined this diagnosis during abstraction which could have influenced values of abstracted covariates.Because abstractors focused on consult  Note.The TCdD Risk Score is calculated by adding up the points assigned to a patient's status for abdominal pain, previous TCdD, antibiotic exposure prior to hospitalization, location of sampling, patient age, and white blood cell count.TCdD probability by risk score is provided in the bottom table.In a 1,000-bootstrap internal validation, the optimism-corrected C-statistic was 0.627 (95% CI, 0.620-0.629),and the optimism-corrected integrated calibration index was 0.0376 (95% CI, 0.005-0.038).
notes and vital signs (as opposed to microbiology results), we believe that such misclassification is unlikely to be meaningful.Fifth, "missingness" was especially common for laboratory data, especially albumin and lactate.Although we accounted for this using appropriate multiple imputation methods, model performance might improve if all patients had these tests measured, assuming that these factors are importantly predictive of TCdD status.Finally, we do not believe that our model's performance was strong enough to be immediately impactful to clinicians-such as excluding patients for C. difficile toxin testing for patients with a predicted low risk of the disease.However, the predicted TCdD risk could be used by clinicians for risk stratifying patients; for example, patients with an elevated predicted TCdD risk whose initial testing is negative might undergo repeat testing or other related studies for C. difficile such as colonoscopy.If new TCdD predictive factors are identified in the future, our model could serve as a substrate for enhancing TCdD predictive models.
In conclusion, these data show that TCdD risk can be predicted using readily available clinical factors with modest accuracy.Further study is required to determine whether other factors improve TCdD prediction capability.

Figure 1 .
Figure 1.Adjusted association of hospital day and eosinophil count on toxigenic C. difficile diarrhea (TCdD).The adjusted association of hospital day of sample (A) and eosinophil count (B) with TCdD is presented.In both plots, the association is presented as the adjusted odds ratio (vertical axis) accounting for all other variables in the TCdD Model (Appendix B) relative to midpoint values (day 9 and eosinophil count of 0.25).

Figure 3 .
Figure 3. Calibration plot of toxigenic C. difficile diarrhea (TCdD) model and score.This figure plots the observed TCdD probability (vertical axis) against the expected TCdD probability (horizontal axis) TCdD Model (Appendix B, blue) and the TCdD probability risk score (Table4, black).Optimism-adjusted fit statistics in the internal validation population (top left) were calculated on 1,000 bootstrap samples.

Table 1 .
Summary of published studies measuring the association of factors with toxigenic C. difficile diarrhea (TCdD) related outcomes Note.Up arrow (↑) indicates that variable was associated with a significantly increased risk of outcome.Down arrow (↓) indicates that variable was associated with a significantly decreased risk of outcome.Black arrow indicates that a significant association was adjusted for other factors.Gray arrows indicate that significant association was unadjusted.Dash (-) indicates no significant association between outcome and variable.(CKD, chronic kidney disease; DEMO, demographics; PE, physical examination; MA, meta-analysis; SR, systematic review; C, cohort; IBD, inflammatory bowel disease; PPI, proton pump inhibitors; H2RA, histamine-2 receptor antagonist; WBC, white blood cell)

Table 3 .
Influence of binary variables in toxigenic C. difficile diarrhea (TCdD) model Note.The influence of all binary variables in the TCdD Risk Model (Appendix B) is presented.(OR, odds ratio; CI, confidence interval; CDAD, C. difficile-associated diarrhea; Dept., Department)